114 research outputs found
Revisiting Precision and Recall Definition for Generative Model Evaluation
In this article we revisit the definition of Precision-Recall (PR) curves for
generative models proposed by Sajjadi et al. (arXiv:1806.00035). Rather than
providing a scalar for generative quality, PR curves distinguish mode-collapse
(poor recall) and bad quality (poor precision). We first generalize their
formulation to arbitrary measures, hence removing any restriction to finite
support. We also expose a bridge between PR curves and type I and type II error
rates of likelihood ratio classifiers on the task of discriminating between
samples of the two distributions. Building upon this new perspective, we
propose a novel algorithm to approximate precision-recall curves, that shares
some interesting methodological properties with the hypothesis testing
technique from Lopez-Paz et al (arXiv:1610.06545). We demonstrate the interest
of the proposed formulation over the original approach on controlled
multi-modal datasets.Comment: ICML 201
A Fast Multi-Layer Approximation to Semi-Discrete Optimal Transport
International audienceThe optimal transport (OT) framework has been largely used in inverse imaging and computer vision problems, as an interesting way to incorporate statistical constraints or priors. In recent years, OT has also been used in machine learning, mostly as a metric to compare probability distributions. This work addresses the semi-discrete OT problem where a continuous source distribution is matched to a discrete target distribution. We introduce a fast stochastic algorithm to approximate such a semi-discrete OT problem using a hierarchical multi-layer transport plan. This method allows for tractable computation in high-dimensional case and for large point-clouds, both during training and synthesis time. Experiments demonstrate its numerical advantage over multi-scale (or multi-level) methods. Applications to fast exemplar-based texture synthesis based on patch matching with two layers, also show stunning improvements over previous single layer approaches. This shallow model achieves comparable results with state-of-the-art deep learning methods, while being very compact, faster to train, and using a single image during training instead of a large dataset
Local matching indicators for transport problems with concave costs
In this paper, we introduce a class of indicators that enable to compute
efficiently optimal transport plans associated to arbitrary distributions of N
demands and M supplies in R in the case where the cost function is concave. The
computational cost of these indicators is small and independent of N. A
hierarchical use of them enables to obtain an efficient algorithm
Generating Private Data Surrogates for Vision Related Tasks
International audienceWith the widespread application of deep networks in industry, membership inference attacks, i.e. the ability to discern training data from a model, become more and more problematic for data privacy. Recent work suggests that generative networks may be robust against membership attacks. In this work, we build on this observation, offering a general-purpose solution to the membership privacy problem. As the primary contribution, we demonstrate how to construct surrogate datasets, using images from GAN generators, labelled with a classifier trained on the private dataset. Next, we show this surrogate data can further be used for a variety of downstream tasks (here classification and regression), while being resistant to membership attacks. We study a variety of different GANs proposed in the literature, concluding that higher quality GANs result in better surrogate data with respect to the task at hand
On the Theoretical Equivalence of Several Trade-Off Curves Assessing Statistical Proximity
The recent advent of powerful generative models has triggered the renewed
development of quantitative measures to assess the proximity of two probability
distributions. As the scalar Frechet inception distance remains popular,
several methods have explored computing entire curves, which reveal the
trade-off between the fidelity and variability of the first distribution with
respect to the second one. Several of such variants have been proposed
independently and while intuitively similar, their relationship has not yet
been made explicit. In an effort to make the emerging picture of generative
evaluation more clear, we propose a unification of four curves known
respectively as: the precision-recall (PR) curve, the Lorenz curve, the
receiver operating characteristic (ROC) curve and a special case of R\'enyi
divergence frontiers. In addition, we discuss possible links between PR /
Lorenz curves with the derivation of domain adaptation bounds.Comment: 10 pages, 3 figure
Detecting Overfitting of Deep Generative Networks via Latent Recovery
State of the art deep generative networks are capable of producing images
with such incredible realism that they can be suspected of memorizing training
images. It is why it is not uncommon to include visualizations of training set
nearest neighbors, to suggest generated images are not simply memorized. We
demonstrate this is not sufficient and motivates the need to study
memorization/overfitting of deep generators with more scrutiny. This paper
addresses this question by i) showing how simple losses are highly effective at
reconstructing images for deep generators ii) analyzing the statistics of
reconstruction errors when reconstructing training and validation images, which
is the standard way to analyze overfitting in machine learning. Using this
methodology, this paper shows that overfitting is not detectable in the pure
GAN models proposed in the literature, in contrast with those using hybrid
adversarial losses, which are amongst the most widely applied generative
methods. The paper also shows that standard GAN evaluation metrics fail to
capture memorization for some deep generators. Finally, the paper also shows
how off-the-shelf GAN generators can be successfully applied to face inpainting
and face super-resolution using the proposed reconstruction method, without
hybrid adversarial losses
Co-segmentation non-supervisée d'images utilisant les distances de Sinkhorn
National audienceIn this work, a convex and robust formulation of the unsupervised co-segmentation problem is introduced for pair of images. The proposed model relies on the optimal transport theory to asset the statistical similarity of the segmented regions’ features (color histograms in this work). The optimal transport cost is approximated by Sinkhorn distance to reduce the optimization complexity. A primal-dual algorithm is used to solve the problem efficiently, without making use of sub-iterative routines.Nous proposons une formulation convexe et robuste du problème de co-segmentation non supervisée de paire d'images. Ce modèle définit l'adéquation statistique des régions segmentées dans le cadre du transport optimal, en mesurant le coût de transport entre les histogrammes de descripteurs (ici la couleur). Afin de réduire la complexité de mise en oeuvre de ce modèle, les coûts de transport optimaux sont approchés par les distances de Sinkhorn, qui sont formulées comme la régularisation entropique du transport optimal. Un algorithme itératif exploitant la formulation primale-duale du problème est utilisé pour résoudre le problème de manière efficace et exacte
Mise en correspondance de descripteurs géométriques locaux par méthode a contrario
De nombreuses applications en analyse d'images s'appuient sur une représentation par des descripteurs locaux tels que les SIFT [3]. La mise en correspondance de ces descripteurs, bien que cruciale, est le plus souvent réduite à un seuillage sur la distance au plus proche voisin. Dans cette contribution, une nouvelle mesure de dissimilarité robuste à la quantification des descripteurs est proposée. Nous présentons ensuite un critère de mise en correspondance, inspiré des méthodes « a contrario » [1], qui permet d'évaluer le degré de significativité des appariements testés et fournit des seuils de validation qui s'adaptent automatiquement à la complexité et à la diversité des données
Methods to Improve Bulk Lifetime in n-Type Czochralski-Grown Upgraded Metallurgical-Grade Silicon Wafers
This paper investigates the potential of three different methods-tabula rasa (TR), phosphorus diffusion gettering (PDG), and hydrogenation, for improving the carrier lifetime in n-type Czochralski-grown upgraded metallurgical-grade (UMG) silicon samples. Our results show that the lifetimes in the UMG wafers used in this study were affected by both mobile metallic impurities and as-grown oxygen precipitate nuclei. Thus, the dissolution of grown-in oxygen precipitate nuclei via TR and the removal of mobile impurities via PDG step were found to significantly improve the electronic quality of the UMG wafers. Finally, we report bulk lifetimes and 1-sun implied open-circuit voltages of the UMG wafers after boron and phosphorus diffusions, as typically applied in n-type cell fabrication.This work has been supported by the Australian Renewable Energy
Agency (ARENA) through research grant RND009.
- …